Skip to content

Conversation

@rem1776
Copy link
Contributor

@rem1776 rem1776 commented Sep 22, 2025

Description
draft for the io offloading changes

How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Please also note
any relevant details for your test configuration (e.g. compiler, OS). Include
enough information so someone can reproduce your tests.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • New check tests, if applicable, are included
  • make distcheck passes

@rem1776 rem1776 marked this pull request as ready for review December 1, 2025 17:52
Copy link
Contributor

@vithikashah001 vithikashah001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs documentation for variables, subroutines and functions.

@J-Lentz J-Lentz self-requested a review January 5, 2026 18:50
Copy link
Contributor

@uramirez8707 uramirez8707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes, but I think this is a good start

integer, parameter :: non_domain_decomposed = 1 !< enumeration for type of domain
integer, parameter :: Unstructured_grid = 2 !< enumeration for type of domain

integer, parameter :: max_files = 10 !< amount of offloaded files to allocate space for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO Make this a namelist, runtime constant, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added with 4a5ed59

!! Registers an axis to a netcdf file on offloaded PEs. File must have been opened with open_file_offload.
interface register_axis_offload
procedure :: register_netcdf_axis_offload
procedure :: register_domain_axis_offload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO we need a register_unstructured_axis_offload for the unstructured grid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added any leftover TODOs as comments with 969513e


!< The number of tiles must be the same as the number of offloading pes
!if (size(pe_out) .ne. ntile ) &
! call mpp_error(FATAL, "The number of offloading PEs must be the same as the number of tiles of the domain")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added back, and it should crash if the ntiles is not a multiple of the ntiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added with 969513e

Comment on lines +121 to +123
call mpp_broadcast(filename_out, str_len, pe_in(1))
call mpp_broadcast(global_domain_size, size(global_domain_size), pe_in(1))
call mpp_broadcast(ntile, pe_in(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO maybe we can find a way to do this all in one mpp_brodcast call (or mpp_send/mpp_receive)

integer, allocatable :: all_current_pes(:)
integer, allocatable :: broadcasting_pes(:)
logical :: is_model_pe
character(len=255) :: var_info(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to use the max str length variable

Comment on lines 347 to 352
if (.not. is_model_pe) then
select type(wut=>this%fileobj)
type is(FmsNetcdfDomainFile_t)
call register_axis(wut, trim(var_info(1)), trim(var_info(2)))
end select
endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clean this up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type(domain2D) :: domain_out
type(domain2D) :: domain_in
integer :: isc, iec, jsc, jec, nz, redistribute_clock
character(len=255) :: varname_tmp(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use max str length

!> Structure to hold offloading file information
type :: offloading_obj_out
integer :: id
character(len=:), allocatable :: filename !< filename of the offloaded netcdf file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to store the filename here, as opposed to getting it from fileobj%path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No looks like it can be removed, I'll add that change.

character(len=:), allocatable :: filename !< filename of the offloaded netcdf file
class(FmsNetcdfFile_t), allocatable :: fileobj !< fms2_io file object
type(domain2D) :: domain_out !< domain on offloading PEs
integer :: type_of_domain !< type of domain (domain_decomposed, non_domain_decomposed, Unstructured_grid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem that type_of_domain is used. Is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to remove this as well.

integer :: type_of_domain !< type of domain (domain_decomposed, non_domain_decomposed, Unstructured_grid)
end type

!> Offload equivalent of open_file in fms2_io_mod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems like it belongs above open_file_offload.

type_of_domain = domain_decomposed
type is (FmsNetcdfUnstructuredDomainFile_t)
type_of_domain = Unstructured_grid
end select
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_of_domain is assigned, but never used.

endif

if (.not. is_model_pe) then
select type(wut=>this%fileobj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since FmsNetcdfDomainFile_t has no types derived from it, perhaps change fileobj from class(FmsNetcdfDomainFile_t) to type(FmsNetcdfDomainFile_t), and remove the select type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uramirez8707 had a similar comment too I believe.

Since the main derived type (offloading_obj_out) uses a class(FmsNetcdfFile_t), i was thinking it might be cleaner to just make all the routines take class(FmsNetcdfFile_t) instead and just error out for now in case its not a supported file type.

I figured that makes it a bit easier to use and will give us some flexibility in the future, but lmk what you guys think.

endif

if (.not. is_model_pe) then
select type(wut=>this%fileobj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment, regarding class(FmsNetcdfDomainFile_t) to type(FmsNetcdfDomainFile_t).

select type(wut=>this%fileobj)
type is(FmsNetcdfDomainFile_t)
call register_field(wut, trim(var_info(1)), trim(var_info(2)), var_info(3:))
end select
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to handle the case where fileobj is FmsNetcdfFile_t as opposed to FmsNetcdfDomainFile_t - is this case supported?

!! This routine should be called from both the model PEs and the offload PEs, with the full list
!! of pes for each group being provided. The model PEs will broadcast the filename and domain.
subroutine open_file_offload(fileobj, filename, domain_in, pe_in, pe_out)
class(FmsNetcdfFile_t), intent(inout) :: fileobj !< fms2_io file object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other subroutines (e.g., write_data_offload) only seem to support FmsNetcdfDomainFile_t and not the FmsNetcdfFile_t base type. If non-domain-decomposed files are unsupported, type(FmsNetcdfDomainFile_t) should probably be used consistently for fileobj arguments.

@rem1776
Copy link
Contributor Author

rem1776 commented Jan 12, 2026

@uramirez8707 @J-Lentz @vithikashah001 This should be ready for another pass at reviews, I think every comment should be covered. See #1774 (comment) for how i handled the typing, i just made everything take the base FmsNetcdfFile_type class for now with some type checking added.

@J-Lentz J-Lentz self-requested a review January 12, 2026 20:13
Copy link
Contributor

@uramirez8707 uramirez8707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for now, but we have a lot of clean up to do.

Let’s plan the next steps next week.

@vithikashah001 vithikashah001 merged commit 3d4ac8c into NOAA-GFDL:main Jan 13, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants